Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT #1996

openinx · 2020-12-28T12:24:12Z

Many people will export the result of flink aggregate values into apache iceberg table, for example:

SELECT count(click_num)  FROM click_events GROUP BY DATE(click_timestamp) ;

This stream query will count the click number since the beginning of today (00:00:00), every emitted events will be a UPSERT events which overwrite the previous accumulated click_num.

In this cases, we will need to transform all INSERT/UPDATE_AFTER to be UPSERT, which means DELETE + INSERT the key.

… use UPSERT.

rdblue · 2020-12-28T19:30:35Z

flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java

+     * All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will
+     * DELETE the old records and then INSERT the new records. In partitioned table, the partition fields should be
+     * a subset of equality fields, otherwise the old row that located in partition-A could not be deleted by the
+     * new row that located in partition-B.


Does anything validate this constraint?

rdblue · 2020-12-28T19:33:52Z

flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java

    switch (row.getRowKind()) {
      case INSERT:
      case UPDATE_AFTER:
+        if (upsert) {


It seems like this should only happen for the INSERT case because UPDATE_AFTER implies that there was an UPDATE_BEFORE that will perform the delete. This would delete the same row twice in that case, causing more equality deletes to be written for the row.

rdblue · 2020-12-28T19:34:45Z

Mostly looks good, but I don't think that upsert should be supported for UPDATE_AFTER. Interested to hear your rationale for that case.

…re to use UPSERT (apache#1996)

himanshpal · 2021-06-02T06:44:40Z

@rdblue @openinx - Is there any update on this. Currently we are seeing duplicate rows while writing/compacting cdc events to table ?

wg1026688210 · 2021-07-02T02:57:26Z

Whether the pr can be merge . In one of our scenarios, the binlog of tidb has no before_update data before after_update. We hope flink can help us to do it @rdblue @openinx

haormj · 2021-07-16T01:48:05Z

@openinx @rdblue Is there any update on this?

rdblue · 2021-07-19T17:44:37Z

I think this is just waiting on someone to pick it up again. UPSERT should be unblocked now that row identifier fields have been added.

Reo-LEI · 2021-07-25T16:03:36Z

I'm pick this up on #2863 @rdblue

openinx · 2021-07-27T11:40:33Z

Since @Reo-LEI picked this PR, I will close this PR now. And let's review that PR here.

Flink: Flat INSERT as one DELETE following one INSERT if configure to…

b047073

… use UPSERT.

github-actions bot added the flink label Dec 28, 2020

Minor changes.

bab1b22

rdblue reviewed Dec 28, 2020

View reviewed changes

coolderli pushed a commit to coolderli/iceberg that referenced this pull request Apr 26, 2021

Flink: Transform INSERT as one DELETE following one INSERT if configu…

c8efbbb

…re to use UPSERT (apache#1996)

openinx mentioned this pull request Jun 1, 2021

Flink CDC iceberg table have duplicate rows #2610

Closed

Reo-LEI mentioned this pull request Jul 25, 2021

Flink: Add streaming upsert write option. #2863

Merged

openinx closed this Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT #1996

Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT #1996

Uh oh!

openinx commented Dec 28, 2020

Uh oh!

rdblue Dec 28, 2020

Uh oh!

rdblue Dec 28, 2020

Uh oh!

rdblue commented Dec 28, 2020

Uh oh!

himanshpal commented Jun 2, 2021

Uh oh!

wg1026688210 commented Jul 2, 2021 •

edited

Loading

Uh oh!

haormj commented Jul 16, 2021

Uh oh!

rdblue commented Jul 19, 2021

Uh oh!

Reo-LEI commented Jul 25, 2021

Uh oh!

openinx commented Jul 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT #1996

Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT #1996

Uh oh!

Conversation

openinx commented Dec 28, 2020

Uh oh!

rdblue Dec 28, 2020

Choose a reason for hiding this comment

Uh oh!

rdblue Dec 28, 2020

Choose a reason for hiding this comment

Uh oh!

rdblue commented Dec 28, 2020

Uh oh!

himanshpal commented Jun 2, 2021

Uh oh!

wg1026688210 commented Jul 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haormj commented Jul 16, 2021

Uh oh!

rdblue commented Jul 19, 2021

Uh oh!

Reo-LEI commented Jul 25, 2021

Uh oh!

openinx commented Jul 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wg1026688210 commented Jul 2, 2021 •

edited

Loading